Creating an Appropriate Corpus for PP Attachment Training

نویسنده

  • Brian Mitchell
چکیده

This paper describes work in progress that is identifying shortcomings of existing Prepositional Phrase (PP) attachment algorithms and producing a new resource derived from the Penn TreeBank (PTB) corpus. The aim is to use this new resource (PTB Prime) to improve the accuracy of PP attachment algorithms and use this in an existing text processing system (LaSIE-II).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary

This paper deals with two important ambiguities of natural language: prepositional phrase attachment and word sense ambiguity. We propose a new supervised learning method for PPattachment based on a semantically tagged corpus. Because any sufficiently big sense-tagged corpus does not exist, we also propose a new unsupervised context based word sense disambiguation algorithm which amends the tra...

متن کامل

A Flexible Unsupervised PP-Attachment Method Using Semantic Information

In this paper we revisit the classical NLP problem of prepositional phrase attachment (PPattachment). Given the pattern V −NP1−P−NP2 in the text, where V is verb,NP1 is a noun phrase, P is the preposition and NP2 is the other noun phrase, the question asked is where does P −NP2 attach: V or NP1? This question is typically answered using both the word and the world knowledge. Word Sense Disambig...

متن کامل

Combining Unsupervised and Supervised Methods for PP Attachment Disambiguation

Statistical methods for PP attachment fall into two classes according to the training material used: first, unsupervised methods trained on raw text corpora and second, supervised methods trained on manually disambiguated examples. Usually supervised methods win over unsupervised methods with regard to attachment accuracy. But what if only small sets of manually disambiguated material are avail...

متن کامل

Disambiguation of English PP Attachment using Multilingual Aligned Data

Prepositional phrase attachment (PP attachment) is a major source of ambiguity in English. It poses a substantial challenge to Machine Translation (MT) between English and languages that are not characterized by PP attachment ambiguity. In this paper we present an unsupervised, bilingual, corpus-based approach to the resolution of English PP attachment ambiguity. As data we use aligned linguist...

متن کامل

Using Parsed Corpora for Structural Disambiguation in the TRAINS Domain

This paper describes a prototype disambiguation module KANKEI which uses two corpora of the TRAINS project In ambiguous verb phrases of form V NP PP or V NP adverb s the two corpora have very di erent PP and adverb attachment patterns in the rst the correct attachment is to the VP of the time while in the second the correct attachment is to the NP of the time KANKEI uses various n gram patterns...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007